Can continuous speech recognizers handle isolated speech?
نویسندگان
چکیده
Continuous speech is far more natural and ecient than isolated speech for communication. However, for current state-of-the-art automatic speech recognition systems, isolated speech recognition (ISR) is far more accurate than continuous speech recognition (CSR). It is common practice in the speech research community to build CSR systems using only CS data. However, slowing of the speaking rate is a natural reaction for a user faced with the high error rates of current CSR systems. Ironically, CSR systems typically have a much higher word error rate when speakers slow down since the acoustic models are usually derived exclusively from continuous speech corpora. In this paper, we summarize our eorts to improve the robustness of our speaker-independent CSR system against speaking styles, without suering a recognition accuracy penalty. In particular the multi-style trained system described in this paper attains a 7.0% word error rate for a test set consisting of both isolated and continuous speech, in contrast to the 10.9% word error rate achieved by the same system trained only on continuous speech. Ó 1998 Elsevier Science B.V. All
منابع مشابه
Discriminative feature weighting for HMM-based continuous speech recognizers
The Discriminative Feature Extraction (DFE) method provides an appropriate formalism for the design of the frontend feature extraction module in pattern classification systems. In the recent years, this formalism has been successfully applied to different speech recognition problems, like classification of vowels, classification of phonemes or isolated word recognition. The DFE formalism can be...
متن کاملRecognition of Prosodic Factors and Detection of Landmarks for Improvements to Continuous Speech Recognition Systems
This thesis examines the usefulness of including prosodic and phonetic context information in the phoneme model of a speech recognizer. This is done creating a series of prosodic and phonetic models and then comparing the log likelihoods of each model. The comparison of log likelihoods shows that both prosodic and phonetic context information improve the phoneme model for most phonemes. The pro...
متن کاملAutomatic Generation of Pronunciation Dictionaries
In this report we will describe a data driven approach for creating pronunciation dictionaries for a new unseen target language by voting among phoneme recognizers in nine different languages other than the target language. In this process recordings of the new language that are transcribed on word level are decoded by the phoneme recognizers. This results in a hypothesis of nine phonemes per t...
متن کاملDimensionality reduction of the enhanced feature set for the HMM-based speech recognizer
In the past few years, a great deal of research has been directed toward finding acoustic features that are effective for automatic speech recognition. Until recently, most of the speech recognizers used about 12 cepstral coefficients derived through the linear prediction analysis as recognition features [ 11. In [2,3], Furui investigated the use of temporal derivatives of cepstral coefficients...
متن کاملA syllable based continuous speech recognizer for Tamil
This paper presents a novel technique for building a syllable based continuous speech recognizer when unannotated transcribed train data is available. We present two different segmentation algorithms to segment the speech and the corresponding text into comparable syllable like units. A group delay based two level segmentation algorithm is proposed to extract accurate syllable units from the sp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Speech Communication
دوره 26 شماره
صفحات -
تاریخ انتشار 1997